<!DOCTYPE html>
<html>

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>ScanRefer: 3D Object Localization in RGB-DScans using Natural Language</title>
    <link rel="stylesheet" href="w3.css">
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" integrity="sha384-9aIt2nRpC12Uk9gS9baDl411NQApFmC26EwAOH8WgZl5MYYxFfc+NcPb1dKGj7Sk" crossorigin="anonymous">
    <script src="https://code.jquery.com/jquery-3.5.1.slim.min.js" integrity="sha384-DfXdz2htPH0lsSSs5nCTpuj/zy4C+OGpamoFVy38MVBnE+IbbVYUew+OrCXaRkfj" crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/popper.js@1.16.0/dist/umd/popper.min.js" integrity="sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo" crossorigin="anonymous"></script>
    <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js" integrity="sha384-OgVRvuATP1z7JjHLkuOU7Xw704+h835Lr+6QL9UvYjZE3Ipu6Tp75j7Bh/kR0JKI" crossorigin="anonymous"></script>
</head>

<body>

<br/>
<br/>

<div class="w3-container" id="paper">
    <div class="w3-content" style="max-width:850px">
  
    <h2 align="center" id="title"><b>ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language</b></h2>
    <br/>

    <p align="center" id="title">European Conference on Computer Vision (ECCV), 2020.</p>

    <p align="center" class="center_text" id="authors">
        <a target="_blank" href="http://www.niessnerlab.org/members/zhenyu_chen/profile.html">Dave Zhenyu Chen</a><sup>1</sup>
        &nbsp;&nbsp;&nbsp;&nbsp;
        <a target="_blank" href="https://angelxuanchang.github.io/">Angel X. Chang</a><sup>2</sup>
        &nbsp;&nbsp;&nbsp;&nbsp;
        <a target="_blank" href="https://www.niessnerlab.org/members/matthias_niessner/profile.html">Matthias Nie&szlig;ner</a><sup>1</sup>
        &nbsp;&nbsp;&nbsp;&nbsp;
    </p>

    <p class="center_text" align="center">
        <sup>1</sup>Technical University of Munich
        &nbsp; &nbsp; &nbsp;
        <sup>2</sup>Simon Fraser University
    </p>

    <br>
        <h4 align="center" id="title"><b>Submit to our ScanRefer Localization Benchmark <a href="http://kaldir.vc.in.tum.de/scanrefer_benchmark/" target="__blank">here</a>!</b></h4>

        <br><center><a href="http://kaldir.vc.in.tum.de/scanrefer_benchmark/" target="__blank"><img src="teaser.png" style="max-width:100%" /></a></center><br>
        
        <h3 class="w3-left-align" id="video"><b>Introduction</b></h3>
        <p>
            We introduce the task of 3D object localization in RGB-D scans using natural language descriptions.
            As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object.
            To address this task, we propose ScanRefer, learning a fused descriptor from 3D object proposals and encoded sentence embeddings.
            This fused descriptor correlates language expressions with geometric features, enabling regression of the 3D bounding box of a target object.
            We also introduce the ScanRefer dataset, containing 51,583 descriptions of 11,046 objects from 800 ScanNet scenes. 
            ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.
        </p>

        <h3 class="w3-left-align" id="video"><b>Video</b></h3>
        <p>
        <iframe width="850" height="480" src="https://www.youtube.com/embed/T9J5t-UEcNA" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
        <p/>

        <h3 class="w3-left-align" id="video"><b>Browse</b></h3>
        <p>
            The ScanRefer data can be browsed online in your web browser. Learn more at <a href="http://kaldir.vc.in.tum.de:8080/apps/main" target="__blank">the ScanRefer Data Browser</a>.
            <br/> (For a better browsing experience, we recommend using Google Chrome.)
        </p>

        <center><a href="http://kaldir.vc.in.tum.de:8080/apps/main" target="__blank"><img src="browser.png" style="max-width:100%" /></a></center><br>

        <h3 class="w3-left-align" id="publication"><b>Publication</b></h3>
        European Conference on Computer Vision (ECCV), 2020. <br/>
        <a href="davezchen_eccv2020_scanrefer.pdf" target="__blank">Paper</a> | <a href="https://arxiv.org/abs/1912.08830" target="__blank">arXiv</a> | <a href="https://github.com/daveredrum/ScanRefer" target="__blank">Code</a>
        <center>
            <a href="davezchen_eccv2020_scanrefer.pdf" target="__blank"><img src="paper.jpg" style="max-width:100%" /></a>
        </center><br>

        If you find our project useful, please consider citing us:
        <pre class="w3-panel w3-leftbar w3-light-grey" style="white-space: pre-wrap; font-family: monospace; font-size: 11px">

@article{chen2020scanrefer,
    title={ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language},
    author={Chen, Dave Zhenyu and Chang, Angel X and Nie{\ss}ner, Matthias},
    journal={16th European Conference on Computer Vision (ECCV)},
    year={2020}
}

</pre>

        <h3 class="w3-left-align" id="dataset"><b>Dataset Download</b></h3>

        If you would like to access to the ScanRefer dataset, please fill out the <a href="https://forms.gle/aLtzXN12DsYDMSXX6">ScanRefer Terms of Use Form</a>. Once your request is accepted, you will receive an email with the download link.

    </div>


</div>

<br/>
<br/>

</body>
</html>
